一、簡介

  • 資料庫:postgreSQL 14
  • 編輯器:pgAdmin4 6.1版本

主要用SQL的指令先將我們所需要的資料從資料庫中取出,再用R將資料整理成我們要的形式。



二、PostgreSQL

1. 開啟

1.1 開啟pdAdmin4

1.2 輸入伺服器密碼

1.3 點選伺服器

  • 點選左上角的Browser底下的Servers

1.4 輸入資料庫密碼

(通常和伺服器密碼相同)

1.5 進入pgAdmin4主畫面

2. 查看資料表

2.1 點選左側的資料庫展開

Servers>>PostgreSQL 14>>Databases>>CIQ_Target>>Schemas>>public>>Tables

  • 依以上順序展開即可查看CIQ_Target中的所有資料表。

  • 在左側選單中對想要查看的表格按右鍵

  • View/Edit Data>>All Rows可察看完整資料表

2.2 指令

  • 在左側選單點選資料庫CIQ_Target
  • 點選Query tool

  • 在Query Editor寫下指令
select * from table_name -- *代表選擇這張表格的所有欄位
select column1, column2, column3 from table_name -- 也可以選擇其中多個欄位
  • 若我想查看ciqgvkeyiid的所有欄位,就可使用以下指令
select * from ciqgvkeyiid

  • 按下F5

  • 右下角會跳出綠底的指令成功資訊


3. 其他指令

在這個使用說明書中只會介紹可能會用到或較常用的指令


3.1 select distinct

  • 使用方法和select完全一樣,只是就會選取到欄位中不重複的資料
select distinct column1, column2, column3 from table_name
  • 範例:
select distinct * from ciqgvkeyiid


3.2 as

  • 主要用在名稱的變換 (更改欄或表格的名稱)
  • as也可以不寫
select column1 as new_column1 from table_name -- 欄的名稱
select column1 new_column1 from table_name -- 欄的名稱
select column1 from table_name as new_table_name -- 表的名稱
select column1 from table_name new_table_name -- 表的名稱
  • 範例:
select 
symbolid sid, 
gvkey GVK, 
iid as id, 
relatedcompanyid comid, 
exchangeid exid, 
objectid obid,  
symbolstartdate sdate,
symbolenddate edate,
activeflag aflag
from ciqgvkeyiid


3.3 where

  • 選擇資料中的其中一個資料
select * from table_name
where condition1
  • 範例: 選取iid欄位中都是01的資料
select * from ciqgvkeyiid
where iid = '01'


3.4 order by

  • 讓其中一欄照順序排列
select * from table_name
order by column1 ASC -- 從小到大
select * from table_name
order by column1 DESC -- 從大到小
select * from table_name
order by column1 ASC, column2 ASC -- 先排column1y在排column2
  • 範例:
select * from ciqgvkeyiid
order by gvkey ASC

select * from ciqgvkeyiid
order by gvkey DESC


3.5 join

  • 連接兩張以上的表格
select table_name1.column1, table_name2.column2 
from table_name1
join table_name2 on condition1
select table_name1.column1, table_name2.column2 
from table_name1
join table_name2 on condition1 and condition2 -- 可加不只一個條件
select table_name1.column1, table_name2.column2, table_name3.column3 
from table_name1
join table_name2 on condition1
join table_name3 on condition2 -- 可加不只一張表格
  • 範例:
select ciqgvkeyiid.gvkey,
spratingidentifier.relatedcompanyid
from ciqgvkeyiid
join spratingidentifier on spratingidentifier.relatedcompanyid = ciqgvkeyiid.relatedcompanyid




三、資料庫使用

1. 原理

想抓出的資料有:

  • gvkey (ciqgvkeyiid.gvkey)
  • rating data (spratingdata.ratingsymbol)
  • rating date (spratingdata.ratingdate)
  • country (spEntityLevelData.datavalue)

2. 流程

  • 首先必須知道這些欄位分別在哪些資料表裡
  • 找到連接資料表的代號
  • 利用Query Editor將資料串一起

3. Query指令

select distinct

GVK.gvkey,

RD.ratingsymbol as splticrm,

(date_trunc('MONTH', date(RD.ratingdate)) + INTERVAL '1 MONTH - 1 day')::DATE as datadate,

ELD.datavalue as country

from spratingdata as RD

join spEntityLevelData as ELD on RD.entitySymbolvalue = ELD.entitySymbolvalue and ratingDataItemId = 21 --21是國家代碼可抓出所有國家

join spratingidentifier RI on RI.symbolvalue = RD.entitysymbolvalue and symboltypeid = 73 -- CompanyID

join ciqgvkeyiid as GVK on RI.relatedcompanyid = GVK.relatedcompanyid

where ratingdate between '1990-1-1' and '2022-7-31' and ELD.datavalue='USA' and RD.ratingTypeCode='FCLONG'

order by GVK.gvkey ASC, datadate ASC

  • 最終呈現資料

4. 資料整理

  • 因為每個月底評級如果沒有變化,在資料表終將不會顯示,所以必須用R來補上每個月的資料
  • 基本概念是將每一個被評級的id抓出來,分別進行資料的補上再合併
# read the rating data
sprating = read.csv('spratingData.csv')
sprating$gvkey = as.character(sprating$gvkey)# change the data type into character from integer
sprating$datadate = as.Date(sprating$datadate)# change the data type into the date from character

# take all of the gvkey
gvkey = unique(sprating$gvkey)

# build a function to transform the data
FillMisMonthData = function(df){
  gvkeyid = df$gvkey[1]
  library(dplyr)
  # construct a matrix with all of the ends of the month between "1990-01-01" and "2022-07-31"
  ts <- seq(as.Date("1990-02-01"),length=391,by="months")-1
  df_time <- data.frame(datadate=ts)
  data_eotm <- suppressMessages(full_join(df_time,df))
  # fill the same data like gvkey and country
  data_eotm$gvkey = gvkeyid
  data_eotm$country = 'USA'
  library(tidyr)
  # fill the rating data with the last seen
  data_eotm = data_eotm %>% fill(splticrm, .direction = 'down')
  # switch the order in a data frame
  data_eotm_fill = select(data_eotm, gvkey, splticrm, datadate, country)
  return(data_eotm_fill)
}

# use the function on all gvkey, and bind together
sprating_fillmismonth = c()
library(progress)
pb <- progress_bar$new(total = length(gvkey))
for(id in gvkey){
  sprating_gvkey = sprating[sprating$gvkey==id,]
  sprating_gvkey = FillMisMonthData(sprating_gvkey)
  sprating_fillmismonth = rbind(sprating_fillmismonth, sprating_gvkey)
  pb$tick()
  Sys.sleep(1/length(gvkey))
}

# save the file to txt or csv
write.table(sprating_fillmismonth, 'D:/spGlobal/sprating_EndOfTheMonth.txt', na = "", row.names=FALSE, quote = F, sep = "\t")

5. 最終呈現的資料